29 research outputs found
Can You Follow Me? Testing Situational Understanding in ChatGPT
Understanding sentence meanings and updating information states appropriately
across time -- what we call "situational understanding" (SU) -- is a critical
ability for human-like AI agents. SU is essential in particular for chat
models, such as ChatGPT, to enable consistent, coherent, and effective dialogue
between humans and AI. Previous works have identified certain SU limitations in
non-chatbot Large Language models (LLMs), but the extent and causes of these
limitations are not well understood, and capabilities of current chat-based
models in this domain have not been explored. In this work we tackle these
questions, proposing a novel synthetic environment for SU testing which allows
us to do controlled and systematic testing of SU in chat-oriented models,
through assessment of models' ability to track and enumerate environment
states. Our environment also allows for close analysis of dynamics of model
performance, to better understand underlying causes for performance patterns.
We apply our test to ChatGPT, the state-of-the-art chatbot, and find that
despite the fundamental simplicity of the task, the model's performance
reflects an inability to retain correct environment states across time. Our
follow-up analyses suggest that performance degradation is largely because
ChatGPT has non-persistent in-context memory (although it can access the full
dialogue history) and it is susceptible to hallucinated updates -- including
updates that artificially inflate accuracies. Our findings suggest overall that
ChatGPT is not currently equipped for robust tracking of situation states, and
that trust in the impressive dialogue performance of ChatGPT comes with risks.
We release the codebase for reproducing our test environment, as well as all
prompts and API responses from ChatGPT, at
https://github.com/yangalan123/SituationalTesting.Comment: EMNLP 2023 Main Paper (Camera Ready
Relating lexical and syntactic processes in language: Bridging research in humans and machines
Potential to bridge research on language in humans and machines is substantial - as linguists and cognitive scientists apply scientific theory and methods to understand how language is processed and represented by humans, computer scientists apply computational methods to determine how to process and represent language in machines. The present work integrates approaches from each of these domains in order to tackle an issue of relevance for both: the nature of the relationship between low-level lexical processes and syntactically-driven interpretation processes. In the first part of the dissertation, this distinction between lexical and syntactic processes focuses on understanding asyntactic lexical effects in online sentence comprehension in humans, and the relationship of those effects to syntactically-driven interpretation processes. I draw on computational methods for simulating these lexical effects and their relationship to interpretation processes. In the latter part of the dissertation, the lexical/syntactic distinction is focused on the application of semantic composition to complex lexical content, for derivation of sentence meaning. For this work I draw on methodology from cognitive neuroscience and linguistics to analyze the capacity of natural language processing systems to do vector-based sentence composition, in order to improve the capacities of models to compose and represent sentence meaning